**** Nationalism, class, and status **** 
*Data preparation**

*** DISTRICT DATA *** 
*preparing district data for analysis*

clear all

import excel using "1907 district data.xlsx", sheet("Round 1") firstrow clear
keep  election	district district_name province mandvote illiteracy gini pop  gr* qualified ballots turnout p_* 


*calculate linguistic group shares in district population
*First: fix missing data in population statistics:

*for some districts, district population is given only for the entire population in a city, not the districts within. 
sort district
replace pop = pop[_n-1] if pop == 0 

*now to the group shares. 
*OeS doesn't provide linguistic data for any electoral districts in LA, Salzburg, and UA, 
*and the Moravia-German districts represent voters registered in the German Wahlkataster, 
*therefore we have to assume that all of these districts are 100% German.  
*LA, Salzburg, and UA essentially are linguistically homogeneous provinces, 
*fairly safe assumption, although we are overlooking LA's small Czech-speaking minority, 
*concentrated in Vienna. 
*the census data we are using here is census data provided among the electoral statistics

replace grgerman = pop if province == "LA" 
replace grgerman = pop if province == "Salzburg" 
replace grgerman = pop if province == "UA" 
replace grgerman = pop if province == "Vorarlberg"

*Moravia: election took place in separate communal districts - ethnic quotas applied. 
*So anyone who voted in the Moravia German 
*and Moravia Czech districts was by self declared ethnicity either or.
replace grgerman = pop if province == "Moravia-German"
replace grczech = pop if province == "Moravia-Czech"


*Other cases are districts for which OeS only presented the population figures for an entire city, or for a large part of a city, 
*but not for each electoral district within that city. 
*The relevant districts are:
*Bohemia 2-4
*Bohemia 8-9
*Bohemia 10-11
*Bohemia 12-13
*Bohemia 14-15
*Bukowina 1-2
*Galicia 1-7
*Galicia 9-10
*Tyrol 1-2
*The lowest-number district in each group of districts should have the relevant population data.
*in xls sheet, made sure that "0" is only entered in these cases, otherwise missing. So "0" can be used to define where 
*values need to be imputed

by province, sort: replace grcroat = grcroat[_n-1] if grcroat == 0
by province, sort: replace grczech = grczech[_n-1] if grczech == 0
by province, sort: replace grgerman = grgerman[_n-1] if grgerman == 0
by province, sort: replace gritalian = gritalian[_n-1] if gritalian == 0
by province, sort: replace grpolish = grpolish[_n-1] if grpolish == 0
by province, sort: replace grruthen = grruthen[_n-1] if grruthen == 0
by province, sort: replace grromanian = grromanian[_n-1] if grromanian == 0
by province, sort: replace grserb = grserb[_n-1] if grserb == 0
by province, sort: replace grsercro = grsercro[_n-1] if grsercro == 0
by province, sort: replace grslovene = grslovene[_n-1] if grslovene == 0


*calculate group percentages of Czechs and Germans in each district as percentage of district population*
gen s_cze = grczech/pop
gen s_ger = grgerman/pop


*replace missings generated when calculating shares with 0: 
quietly foreach x of varlist s_* {
   replace `x'= 0  if `x' == .
}

sum s_*
order election province mandvote illiteracy gini district district_name  pop s_* gr* qualified ballots turnout p_*

drop grcroat gritalian grpolish grruthen grromanian grserb grsercro grslovene 

save "natclass.dta", replace


*add occupational data 
import excel using "1907 voter occupation", sheet("Sheet1") firstrow clear
destring number, replace
rename number num

*keep only the total voting population by district ("erschienen" (turned out) und "nicht erschienen" (did not turn out) together 
*assumption: 
*parties and voters did not know who turned out, if anything they estimated who lived in district*
keep if status == "z."
drop status

*To calculate vars indicating number of voters by profession by district: 
*go from long to wide format
reshape wide num, i(district_name) j(occ) string

*There should be occupational data for 480 districts total.
*Reshaping the occupational data yields 479 districts.
*Reason: one Galician district was missing already from the original k.u.k. statistik source, so all is good, since 
*this is one of the Galician districts that are excluded from the analysis in this article anyway

*first replace missings with 0 to avoid creation of missing when summing up across occ. categories

quietly foreach x of varlist num* {
   replace `x'= 0  if `x' == .
}



order district_name district_adlg

egen distotal = rowtotal(num*)

egen total = total (distotal)

*Create variable for all workers (all with suffix _a in any district): 
gen workers = numBeAr_a + numHaVerEis_a + numHaVerPos_a + numHaVerSon_a + numInGeGro_a + numInGeKle_a + numLaFo_a + numOefFreAnd_a + numOefFreFre_a + numOefFreHof_a
 
 *create variable with only industrial workers
 *Following Bartolini's conceeptualisation of industrial workers, we include workers in 
 *"Industrie und Gewerbe" "Post Telegraphen Telefon" "EIsenbahn Tramway" "Sonstige Handels und Transportunternehmen"
gen indworkers = numInGeGro_a + numInGeKle_a + numHaVerPos_a + numHaVerEis_a + numHaVerSon_a

 *create variable with  agricultural working class: workers and Häusler
gen agrworkers = numLaFo_a + numLaFo_h
 
  *create variable with only Häusler - small farmers: 
 gen häusler = numLaFo_h
 
 *create a variable with the middle class: all self-employed and public employees and geistliche across all sectors 
 gen midclass = numLaFo_s + numLaFo_b +  numInGeGro_s + numInGeGro_b + numInGeKle_s + numInGeKle_b +  numHaVerPos_b + numHaVerPos_k + numHaVerEis_b + numHaVerEis_k + numHaVerSon_s + numHaVerSon_b + numBeAr_b + numOefFreHof_b + numOefFreHof_k + numOefFreAnd_b + numOefFreAnd_l + numOefFreAnd_g + numOefFreAnd_k + numOefFreFre_s + numOefFreFre_b

 *create a variable excluding LaFo middle class (excluding selbständige (that is farmers) or beamte in LaFo) 
  gen urbmidclass = numInGeGro_s + numInGeGro_b + numInGeKle_s + numInGeKle_b +  numHaVerPos_b + numHaVerPos_k + numHaVerEis_b + numHaVerEis_k + numHaVerSon_s + numHaVerSon_b + numBeAr_b + numOefFreHof_b + numOefFreHof_k + numOefFreAnd_b + numOefFreAnd_l + numOefFreAnd_g + numOefFreAnd_k + numOefFreFre_s + numOefFreFre_b

 *create a variable with all working in industrial sector by district 
 gen industry = numInGeGro_a + numInGeGro_s + numInGeGro_b + numInGeKle_a + numInGeKle_s + numInGeKle_b + numHaVerPos_a + numHaVerPos_k + numHaVerPos_b + numHaVerEis_a + numHaVerEis_k + numHaVerEis_b + numHaVerSon_a + numHaVerSon_s + numHaVerSon_b
 
  *create a variable with all working in agricultural sector 
gen agriculture = numLaFo_a + numLaFo_h + numLaFo_b + numLaFo_s

*create variables that give totals per sector 
egen totindustry = total(industry) 
egen totagriculture = total(agriculture)

*calculate shares of industrial and agricultural sector among surveyed population
gen p_industry = totindustry / total
gen p_agriculture = totagriculture / total 

sum workers indworkers agrworkers  häusler midclass urbmidclass industry agriculture

drop num*

save "occupation.dta", replace


use "natclass.dta", clear

merge 1:1 district_name using "occupation.dta"

*those that could not be merged are the two member districts in Galicia we exclude from the analysis. 
*Keep only those that could be merged
keep if _merge ==3
drop _merge

*calculate share of voters by sector as proportion of qualified voters 
gen s_worker = workers/qualified
gen s_indworker = indworkers/qualified
gen s_industry = industry/qualified
gen s_agrworker = agrworkers/qualified
gen s_häusler = häusler/qualified
gen s_midclass = midclass/qualified
gen s_urbmidclass = urbmidclass/qualified

sum s_*


*set missing values that were created to 0 (missing is created when there were 0 workers divided by qualified)  
quietly foreach x of varlist s_* {
   replace `x'= 0  if `x' == .
}


sort district
order election province mandvote illiteracy gini district district_name district_adlg pop s_* gr* qualified ballots turnout p_* 

save "natclass.dta", replace

*create a simplified district data to create the turnout figure (figure 1 in manuscript)
keep election district district_name province pop s_* qualified turnout
save "natclass_districtvars.dta", replace



*******************************************************************************

*** PARTY ELECTORAL DATA **** 
*create party level data set (parties and their election results nested in districts): 

use "natclass.dta", clear
reshape long p_, i(district) j(label) string
rename p_ votes

*change zero votes to missing cause zeros and missings are the same, 
*zero is meaningless in original version of the electoral data (was sometimes entered into xls sometimes not but always means zero votes in district)*
* we do not have data on who ran so we cannot know who had zero votes or who simply did not run in a given district but it is reasonable to assume that whoever ran got at 
*least his own vote 

replace votes = . if votes == 0

*generate dependent variable voteshare as percentage of ballots cast*
gen voteshare = votes/ballots

*drop observations of parties in districts that did not gain any votes in a district
*this shrinks data to parties that gained at least one vote per district 
drop if voteshare == .
sum voteshare


*count parties that gained at least one vote per district -> number of parties per district 
bysort district: gen count = _N
rename count np
label variable np "num of parties"


order election province mandvote illiteracy gini district district_name district_adlg label votes voteshare np 

save "natclass.dta", replace


*add party labels and identifier "party1", so that electoral data and party manifesto data can later be merged

import excel using "Party labels.xlsx", sheet("import") firstrow clear
save "labels.dta", replace

use "natclass.dta", clear
merge m:1 label using "labels.dta"

* check out the ones that were not merged: l district_name label if _merge == 1
*results: those that could not be mapped onto a standardised party1 label are the non-German and non-Czech parties (e.g Polish, Italian parties etc), so all is good

order election province mandvote illiteracy gini district district_name district_adlg label party1 partyname ger cze votes voteshare np 

sort district
drop _merge
save "natclass.dta", replace




*******************************************
****** ADDDING MANIFESTO CODINGS **********
**** for generating manifesto data, see manifesto prep do file **** 

use "natclass.dta", clear

merge m:m party1 using "manifestos-short.dta"

*show and then drop parties for which we found documents, but which we cannot match on a corresponding electoral label (other than guessing)
l party1 if _merge == 2

     /* +-----------+
      |    party1 |
      |-----------|
1593. | gercatpeo |
1594. |    gercen |
1595. |    gercle |
1596. |    gerlib |
1597. |     other |
      +-----------+ */


drop if _merge == 2
drop _merge

sort district

*create numerical identifiers for the parties
sort district party1
egen party_id = group(party1)

gen s_group = s_cze
replace s_group = s_ger if ger == 1


order election province mandvote illiteracy gini district district_name district_adlg np turnout pop grczech ///
grgerman s_cze s_ger s_group distotal total workers indworkers agrworkers häusler midclass  urbmidclass ///
industry  agriculture totindustry  totagriculture s_worker  s_indworker s_industry  s_agrworker s_häusler ///
s_midclass  s_urbmidclass   party_id  label party1  partyname  voteshare  qualified  ballots 


save "natclass.dta", replace



*make a small data set with only nationalists' electoral results to create 
*map1 and map2 (using GIS coding)
use "natclass.dta", clear
keep if nationalist_majEA07_2 == 1 
keep if ger == 1
bysort district: egen gernatvote = total(voteshare)

bysort district: gen count = _n
keep if count == 1
sum gernatvote
keep election province district district_name district_adlg gernatvote

save "natvote.dta", replace 


use "natclass.dta", clear

keep if nationalist_majEA07_2 == 1 
keep if cze == 1
bysort district: egen czenatvote = total(voteshare)

bysort district: gen count = _n
keep if count == 1
sum czenatvote
keep election province district district_name district_adlg czenatvote

merge 1:1 district using "natvote.dta"
sort district 

replace czenatvote = 0 if czenatvote == .
replace gernatvote = 0 if gernatvote == .
drop _merge

save "natvote.dta", replace 

/*nationalist parties are: 
czecatnat
czenatsoc
czeradpro
czeradstarig
youcze
oldcze

geragr
gernat
gerpeo
gerpro
gerrad
pangersch
*/


